# UI Element Detection
Omniparser
MIT
OmniParser is a universal screen parsing tool capable of interpreting/converting user interface screenshots into structured formats to enhance existing LLM-based UI agents.
Image-to-Text
Transformers

O
microsoft
847
1,662
Paligemma 3b Ft Waveui 896
A UI element detection model fine-tuned from PaliGemma 3B 896-resolution weights, specializing in object detection tasks
Image-to-Text
Transformers English

P
agentsea
43
6
Qwen Vl Guidance
Apache-2.0
GUIChat is a multimodal model based on Visual Question Answering (VQA), capable of understanding image content and answering related questions, specifically optimized for GUI element recognition and interaction.
Text-to-Image
Transformers

Q
RhapsodyAI
46
2
Paligemma 3b Ft Widgetcap Waveui 448
A vision-language model fine-tuned for object detection tasks on the WaveUI dataset, based on PaliGemma 3B 448-resolution weights
Image-to-Text
Transformers English

P
agentsea
344
6
Featured Recommended AI Models